Lecture 3.4 - Hypothesis Testing Wisdom

Author

Student

Published

September 24, 2024

Hypothesis Testing Wisdom

Setting Expectations

Today we are working with a dataset on wages in the U.S. collected by the US Department of Labor: us.dol.wages.csv

We will first assume that the dataset approximately represents all workers in the U.S. – statistics of the mean/proportions can serve as stand-ins for the true population parameters.

  1. Subset your data and take a sample of 100 from the South region only using the slice_sample() command (documented here):
south_sample <- us.dol.wages %>%
    filter(south=="yes") %>% 
    slice_sample(n=100, replace=TRUE)
  1. Write down your expectations for one of the following variables. Use Google if needed.
ed - education (years)
wage - salary per week
bluecol - whether the worker works in a blue collar job
union - whether the person is in a union

You can find the proportions/means of these variables with the table() and summary() commands.

Confidence interval, \(p\) values, and \(\alpha\) values

First, consider the issue of an \(\alpha\) value.

  1. What, in your opinion, is a reasonable choice for the \(\alpha\) value? (i.e. how much proof would you need to be convinced there was a ‘real’ difference between the sample and the population)? Write down some reasons for your choice.

  2. Now, generate alternative and null hypotheses, fully specifying the \(\alpha\) level and the tailed-ness of the test for your chosen variable. The null hypothesis should be the overall population proportions/means.

  3. Make a note of why you chose a one or two tailed hypothesis test.

  4. Next, make a small table with both the \(p\) value and the confidence interval for your variable. Check the conditions for hypothesis testing.

  5. Interpret your p-value and confidence interval with respect to your \(\alpha\) level. In the end, do you conclude that your variable is statistically significantly different from the population, based on your sample?

  6. Add two additional columns to your table with the ‘true’ value of the variable for just the south region and the ‘true’ value of your variable for the overall dataset. Were your conclusions correct or not?

Errors & Effect Size

For the same variable, put yourself in the shoes of a policymaker that is considering additional government programs to help people in the south region if it can be shown that they are different on your chosen variable.

  1. Add another column to your table. Assess whether you think a Type I or Type II error would be more serious. Provide a justification for why.

  2. Add another column to your table with the size of the difference. Assess whether the difference is substantively large or not. How do you know if that is a large difference? You may want to check Google, etc. to see what the normal range of variation for your variable is.

  3. Overall, with your partner, write up a summary paragraph with the results of your findings and how we should interpret the results of your calculations and thoughts based on your 100-person sample of Southern workers.

Extra time

  1. Repeat this process by adding another line to your table and select a second variable